Athletes have embraced new technologies to improve and enjoy their sporting activities. For example, Strava (https://www.strava.com/) is a mobile app and website designed for runners and cyclers to track their routes. Surfers are no different. While there are many apps geared towards understanding surf conditions (e.g., http://www.surfline.com/, http://magicseaweed.com/), and more recently, sensors to track individual rides (http://www.traceup.com/), there is a clear void in the middle ground.
I view this middle ground as a simple, personalized application that learns from each individuals lifetime surfing experience, and predicts the quality of a surf session based on simple inputs. In other words - I seek to help answer age-old questions for surfers, like - should I go surfing today? Where should I go surfing today? But the answers to these questions are not generic (e.g., from surfing websites), but rather, tailored to the individual.
After each session, the surfer inputs a few key metrics about their surf. These would include, for example:
The user inputs are used to determine the conditions for each session, including:
using the following data sources:
The user inputs and ocean conditions are used for two purposes:
I used a dataset of surf sessions over three years in Monterey, California for demonstration. This surf log was then integrated with hourly buoy data from Monterey Bay from 2013-present, as well as hourly tidal heights for Monterey bay over the same period.
I used the following packages in R:
Below, I plot the monthly number of surf sessions:
Figure 1. Number of surf session per month over a three year period in Central California
And here I plot the spatial distribution of the surf sessions, summarized in the following ways:
Figure 2. Visualizing a surf log from Monterey Bay, California. The colored sites are ones that will be used in model predictions. Note that some sites are infrequently visited but are worth the trip, like Capitola.
The real benefit of the app will be to inform the user whether a particular site, given the current conditions, will be worth surfing. The statistical model will take the current NOAA, tide, and weather conditions, and integrate them with the historical user data to predict a metric of quality. As a simple starting point, I fit the following generalized linear model, using a poisson distribution because I was modeling a count outcome (the number of rides per session i):
\[Rides_{i} = Poisson(\mu_{i})\] \[E(Rides_{i}) = var(Rides_{i}) = \mu_{i}\]
\[log(\mu_{i}) = \alpha + \beta_{h}Height_{i} + \beta_{p}Period_{i} + \beta_{d}Direction_{i} + \beta_{t}Tide_{i}\]
where Height is swell height (WVHT; m), Period is the dominant swell period (DPD; s), Direction is the mean wave direction (MWD; degrees), and Tide is the tidal height (TideHeight; m).
I fit the model to the central California surf log for each site separately. I can use the model to predict the number of rides as a function of each variable (while holding all others at their median value), and thus visualize the partial effects for one site below:
Figure 3. Partial effects of each predictor from the generalized linear model fit to the surf log data from one site, Asilomar.
In brief, the model suggests the central California Surfer has the best session (in terms of ride count) when the swell height is small (< 4 ft) and the swell direction is from the northwest (> 300). To a lesser extent, central California surfer surfs more waves at lower tide and when the swell is larger, but these effects are marginal.
This analysis is based on an example surf log, together with hand-entered data for the swell and tide conditions. However, the data scraping will be automated. Moreover, the statistical model can be improved. I foresee the following steps for completing the app over the next three months: